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Abstract 

Motivated by the needs of selecting important features for massive neuroimaging 
data, we propose a spatially varying coefficient model (SVCMs) with sparsity and piece- 
wise smoothness imposed on the coefficient functions. A new class of nonparametric 
priors is developed based on thresholded multiscale Gaussian processes (TMGP). We 
show that the TMGP has a large support on a space of sparse and piecewise smooth 
functions, leading to posterior consistency in coefficient function estimation and feature 
selection. Also, we develop a method for prior specifications of thresholding parame¬ 
ters in TMGPs. Efficient posterior computation algorithms are developed by adopting 
a kernel convolution approach, where a modified square exponential kernel is chosen 
taking the advantage that the analytical form of the eigen decomposition is available. 

Based on simulation studies, we demonstrate that our methods can achieve better 
performance in estimating the spatially varying coefficient. Also, the proposed model 
has been applied to an analysis of resting state functional magnetic resonance imaging 
(Rs-fMRI) data from the Autism Brain Imaging Data Exchange (ABIDE) study, it 
provides biologically meaningful results. 
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1 Introduction 


Recent advancements in biomedical imaging technologies have provided abundant infor¬ 
mation and extensive resources for researchers to learn the human brain and neurological 
diseases. A variety of imaging modalities, such as magnetic resonance imaging (MRI), dif¬ 
fusion tensor imaging (DTI), functional magnetic resonance imaging (fMRI) and positron 
emission tomography (PET) have been developed to measure brain structures and functions 
from different perspectives, generating various large-scale spatially distributed measurements 
over a three dimensional (3D) space of the human brain. We refer to those massive spatial 
measurements of brain as neuroimages. This poses great opportunities and new challenges 
for neuroscientists and statisticians to develop efficient analytical methods that extract use¬ 
ful features from neuroimages to characterize the association between the brain activities 
and neurological diseases. To this end, regression analysis, a general and flexible modeling 
framework for studying the association among variables, has been investigated and consid¬ 
ered as a powerful tool in the analysis of massive neuroimaging data, where neuroimages are 
modeled as outcome variables; and the disease status along with the clinical, biological and 
demographical information can all potentially be predictors. 

A pioneer work using the regression model for the neuroimaging data is the mass uni¬ 
variate analysis (MUA). This approach fits a general linear model (GLM) at each spatial 
location in the brain (to which is referred as a voxel) and obtains massive test statistics over 
space to identify voxels/regions that are significantly associated with a specific covariate, 
which requires multiple comparisons correction. One standard procedure is to calculate the 
family-wise error rate (EWER) based on the random field theory for statistical parametric 
maps (Friston et al., 1995; Worsley et ah, 2004; Lazar, 2008). Another approach is to control 
the false discovery rate (FDR) using the observed p-values (Benjamin! and Yekutieli, 2001; 
Genovese et al., 2002). A major drawback of MUA is that the models do not borrow infor¬ 
mation from the spatial dependence across brain locations. In practice, the neuroimaging 
data are usually pre-processed by a spatial smoothing procedure using a kernel convolution 
approach. Performing MUA on these pre-smoothed data may lead to inaccuracy and low 
efficiency in terms of estimating and testing the covariate effects (Ghumbley et ah, 2009; Li 
et al., 2011). Recent development in adaptive smoothing methods for preprocessing (Yue 
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et al., 2010) and estimation (Polzehl and Spokoiny, 2000; Qin, 2007; Tabelow et al., 2008b,a; 
Li et al., 2011; Wang et al., 2013) may improve the performance in terms of redncing noise 
and preserving featnres. It is especially powerfnl to detect delicate featnres snch as jnmp 
discontinnities, which is one of the nniversal characteristics for nenroimaging data (Chan 
and Shen, 2005; Tabelow et ah, 2008a,b; Chnmbley et ah, 2009). 

To achieve a similar goal in the analysis of nenroimaging data, Zhn et al. (2014) recently 
developed a systematic modeling approach nsing a novel spatially varying coefficient model 
(SVCM) which incorporates both spatial dependence and piecewise smooth covariate effects. 
General SVCMs have been extensively investigated and developed for different applications 
in environmental heath, epidemiology, ecology and geographical stndies as demonstrated in 
Cressie and Cassie (1993); Diggle et al. (1998); Gelfand et al. (2003); Smith et al. (2002). The 
SVCM encompasses a wide range of regression models with the ontcome variable observed 
over space and the regression coefficients modeled as fnnctions varying spatially. We refer to 
this type regression coefficients as spatially varying coefficient fnnctions (SVCFs). SVCFs are 
commonly assnmed to be smooth fnnctions or p times continnously differentiable fnnctions 
with p > 1 (we will not make this distinction thronghont the rest of this paper nnless 
noted). To model to SVCFs, smooth spatial processes are nsnally employed to characterize 
the dependence strnctnre over space. A pnre noise process is typically introdnced to captnre 
random variabilities, i.e. the “spatial nngget effect”. Zhn et al. (2014) extended the general 
SVCMs by introdncing jnmp discontinnities into the SVCFs, making the model especially 
nsefnl for nenroimaging data analysis. Based on stepwise mnltiscale estimating procednres 
and asymptotic Wald tests, Zhn et al. (2014) ’s SVCM also can identify the brain regions that 
are signihcantly associated with the given covariates, althongh it is not developed particnlarly 
for featnre selection. 

In this article, we aim to develop a Bayesian featnre selection method for large-scale 
nenroimaging data that can directly select imaging featnres associated with covariates of 
interest. Regnlarization methods have been stndied extensively for variable selection in re¬ 
gression models (Tibshirani, 1996; Fan and Li, 2001; Zon, 2006; Candes and Tao, 2007). 
Bayesian methods have also been developed based on various prior specihcations. Mitchell 
and Beauchamp (1988) developed a prior model for linear model coefficients using the mix- 
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ture of a uniform distribution (slab) and a point mass at zero (spike), which is broadly 
referred to as the spike-and-slab type of priors. George and McCulloch (1993) proposed 
to use the scale mixture of two zero-mean Gaussian distributions and developed posterior 
computation algorithm based on Gibbs sampling. Relative works also include but not lim¬ 
ited to Ishwaran and Rao (2005); Liang et ah (2008); Park and Casella (2008); Hans (2009); 
Bonded and Reich (2012); Johnson and Rossell (2012); Armagan et ah (2013); Poison et ah 
(2014); Narisetty et ah (2014). Most of these priors were initially introduced for indepen¬ 
dent regression coefficients. In light of the needs of integrating complex data structure in 
many applications, recent development of Bayesian variable selection incorporates depen¬ 
dence structures into the prior model. Li and Zhang (2010) assumed that covariates lay on 
an undirected graph and used the Ising prior to incorporate this information to the model 
space and applied this method to analyze the genomics data. For the modeling of spatial 
data, Markov random field (MRF) is one of the commonly used priors for regression coef¬ 
ficients. For instance. Smith et al. (2003); Smith and Fahrmeir (2007) applied this type of 
priors to fMRI data analyses. For the analysis of physical activity and environmental health 
data, Reich et al. (2010) developed an multivariate SVGM along with a Bayesian variable se¬ 
lection procedure to identify important SVGFs, using the spike-and-slab prior. Their focus, 
however, was on distinguishing covariate effects that were zero constant, nonzero constant 
and spatially varying instead of selecting features within the varying coefficient functions. 

As the aforementioned variable selection methods cannot be directly applied to identify 
important features for massive neuroimaging data in the SVGM framework. To fill this gap, 
we develop a novel Bayesian nonparametric prior model for the SVGF. We refer to it as 
the thresholded multiscale Gaussian process (TMGP). The TMGP prior is constructed by 
thresholding a multiscale Gaussian process, which is a combination of two types of GPs: a 
global GP to account for the entire domain spatial dependence and a local GP to accom¬ 
modate the regional fluctuations. Thus, the TMGP can characterize important common 
features of the neuroimaging data, including sparsity, global spatial dependence, piecewise 
smoothness, edge effects and jumps. The proposed TMGP prior enjoys the large support 
property, leading to posterior strong consistency in estimation and feature selection for the 
sparse and piecewise smooth SVGFs. We also develop efficient MGMG posterior compu- 
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tation algorithms based on a kernel convolution approach. A special choice of the kernel 
function enables the computation scalable to an ultra-high dimensional case. 

The remaining parts of the article is organized as follows: In Section 2, we first introduce 
the SVCMs for neuroimaging data analysis and particularly discuss conditions on SVCFs in 
the proposed model. Then we present the construction of TMGP which serves as a prior 
model for SVCFs. In Section 3, we study the theoretical properties of TMGP and the 
proposed SVCMs. In Section 4, we develop an efficient and scalable posterior computation 
algorithm based on a kernel convolution approach. We evaluate the performance of proposed 
method via simulation studies and and analyze the ABIDE data in Section 5. We conclude 
our work with a brief discussion on the future work in Section 6. 

2 Feature selection within the spatially varying coefficient functions 

We start with general notations and definitions. Denote by MP a p-dimensional real 
Euclidean space for any p > 1. For any f3 G write f3 = /32, ■ ■ ■, /3p)'^, define 

||/3||oo= maxi<fc<p|/3fc|, ||/3||i= ELil/^fcl \\(^h= Denote by C a com¬ 

pact region in the standard brain space. Let si,..., G 7^ be a set of spatial locations 
where brain signals are measured. An empirical measure on TZ induced by {si,..., is 
defined as Pn(ds) = ^ ^ where the indicator function I[A] = 1 if event A 

occurs, I[A] = 0, otherwise. For a scalar-valued function /3{-) : TZ ^ M., define ||/d(-)||oo = 
supgg 7 ^|/d(s)| and ||/9(-)||i= /^g.^|/9(s)|P„((is). For a p-dimensional vector-valued function 
/3(s) = [A(s), • • • -.TZ^W, define ||/3(-)||i,oo= maxi<fc<p||/7fc(s)||i. Denote by C{7Z) 

a collection of all the continuous functions defined on TZ. Let D°‘I3 be a partial derivative 
operator on function (3{-) (given its existence) which is given by for a G Denote 

by C^{TZ) a set of functions (5 defined on TZ with continuous partial derivatives for all 
ex such that ||q:||i< p. 

2.1 The spatially varying coefficient model for neuroimaging data 

Suppose the data set consists of m subjects. For each subject j, let yj{s) be the brain signal 
measured from a certain imaging modality at location s G 7^; and there are also p covariates 
are collected, denoted Xj = (x^i,..., Xjp)^, for j = 1,..., m. The spatially varying coefficient 
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model (SVCM) for neuroimaging data is given by 

+ ej(s), (1) 

where /3(s) = [/di(s),... ,/3p(s)]"'" is the spatially varying coefficient function (SVCF) defined 
on IZ. It characterizes associations between covariates and imaging outcomes. To be more 
specihc, {k = quantihes the fcth covariate effects at brain location s. The 

independent error process ej{s) is the assumed to be spatially homogeneous across the whole 
brain for each subject. In our model, we simply assume that ej{s) ~ iV(0, a^) for all subjects 
at all locations. 

For neuroimaging data from commonly used imaging modalities, only outcomes at loca¬ 
tions Si, ..., Sn are observed. At these locations, the SVCM proposed in (1) for the recorded 
neuroimaging data can be expressed as 

I /3(si),a^] ~ N(X/3(si), a‘^1^) (2) 

independently for alH = 1,..., n, where y{si) = [|/i(si),..., ?/m(si)]^, X = [xi, ..., Xm]'^ 
and e(sj) = [ei(sj),..., For simplicity, denote by V = an m x n matrix 

recoding all the neuroimaging outcomes involved in the study. 

In neuroimaging studies, there exists a natural region partition of the whole brain do¬ 
main Ti into bounded connected sets T^i, ...,TIg with non-empty interiors, such that IZ = 
IZg n IZg! = 0, 7 ^ q' . lu mauy cases, one can utilize Rg as neuroanatomical 

regions from commonly used labeling systems such as the Automated Anatomical Labeling 
(AAL) (Tzourio-Mazoyer et ah, 2002). For region of interest (ROI) based analysis, each 
ROI is one parcellated region. For seed-based region-level analysis, Rg can be regarded as 
the clusters showing strong functional connectivities based on preliminary results. In some 
voxelwise analysis with no regional information to be incorporated, we can simply consider 
each voxel (a 3D cubic) as a region and the centers of voxels as observed brain locations 

1 ■ ■ ■ 1 ^n- 

To utilize the proposed SVCM for analysis of neuroimaging data and feature selection, we 
state our assumptions for the SVCF in model (1). Specihcally, we work with piecewise smooth 
(p-times continuously differentiable to be accurate) functions with structured sparsity, which 
can be mathematically expressed as follows: for any function /d(s) defined on 7?. to be a 
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varying coefficient function within model (1), /3{s) must satisfy that: 


(Cl) there exists an index set Ji C {1, ..., 6 *}, such that (3{s) x I[s G TZg] G CP{lZg) where 
TZg is the closure of Rg, for all g & R with p = +1; 

(C2) for any g G Ji, /3(s) is bounded away from zero, that is, 

^ l/^(s)l> 0; 

/v^ 

(C3) let Jo = {1, then /3(s) = 0 for all s G VJg^iRRg. 


The conditions on the SVCFs can be interpreted as follows: (Cl) demonstrates that the 
functions are smooth within each brain region, which implies more homogeneous covariate 
effects; (C2) indicates the jump discontinuities at the boundaries of brain regions; (C3) 
introduces sparsity into each SVCF in model (1) and restricts the sparsity structure at the 
regional level. 

We introduce the notation V to represent a set of functions satisfying (C1)-(C3) defined 
on TZ. The definition V carries the information of Ri,..., Rq as well as A and R. For 
any function (3{s) G V, denote by /i{/9(s)} the index of nonzero sets as given in (Cl) and 
by A{/3(s)} the value A defined in (C2). In the same vein, define a set of p-dimensional 
vector-valued functions P = {/3(s) = [/Si(s), • ■ ■ ,/^p(s)]''' : PRs) G "P, /c = 1,... ,p}. 

2.2 The thresholded multiscale Gaussian process priors 

2.2.1 Construction of the prior 

A Gaussian process (GP) can be regarded as a probabilistic measure on certain functional 
spaces, making it as popular prior models in Bayesian nonparametric data analysis. In gen¬ 
eral, the GP prior, denoted by QV[p{-), G(-, •)], is determined by its mean function /i : P i—)■ M 
and the covariance kernel function G : P x P i—)■ M. A draw (5{-) ~ QV[p{-), G(-, •)] is a func¬ 
tion defined on P such that any finite collection of its function values are jointly multivariate 
Gaussian. To be specific, for any choices of si,..., G P, [/^(si),..., /3(s„)]''' iV(^,C) 

with pL = [/i(si),..., /i(s„)]''' and C = {G(sj, Sj)}i<i<n,i<j<n- The boundedness and smooth¬ 
ness of random functions generated from GP are determined through the covariance kernel 
function. Typical choices for the covariance kernel functions include but not limited to 
the rational quadratic kernel, Matern class of kernels, the square exponential kernel. Stein 
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(1999); Williams and Rasmussen (2006) contain more detailed discussions on the covariance 
functions. 


To enable detailed feature selections within the SVCFs /3fc(s) G "P (/c = 1,... ,p) in model 
(1), we develop the thresholded multiscale Gaussian process (TMGP) prior, which can be 
represented as follows: (3{s) ~ TAiQV[T^, 6 *^, A, /?(-, ■)] implies that 


/3{s) = /3( s )/ a [/3( s )], 

(3) 

/3(s) = 7 (s) + e(s). 

(4) 

7 (s) ~ gV[0,T‘^K{s,s')], 

(5) 

e{s) ~ gv[o,e^x{s,s')] 

(6) 


for all s G P, where 

G 

h0(s)] = J2l 

9=1 

is the thresholding function that generate region-wise sparse features; A > 0 is the threshold¬ 
ing parameter; > 0 and 6^ > 0 are variance parameters in the GPs; k{-, ■) : P x P i—)■ M is 
a kernel correlation function and x(s, s') is constructed from k as x(s, s') = ^ 

J[s,s' G Rg]. 

The thresholding construction of f3{s) from a particular multiscale Gaussian process (Dun- 
son and Fox, 2012, mGP) f3{s) introduces sparsity and enables feature selection. For voxel- 
wise analysis without regional information, the thresholding function in (7) can be simplihed 
as /a[/3(s)] = /[|/3(s)|> A], where s denotes the center of a voxel. The mGP /3{s) in our prior 
is a combination of one “global” GP, 7 (s), which captures the general dependence structures 
across the whole brain domain and one “local” GP, e(s), reflecting the dependence and vari¬ 
abilities within each parcellated brain region. This multiscale construction can also naturally 
generate jumping discontinuities on the boundaries of the brain regions. 

We illustrate the procedure to sample an SVGF from our TMGP priors on a two- 
dimensional square region in Figure 1, where the dashed lines partition the whole region 
into four equally spaced sub-regions. 7 ( 5 ) is smooth over the whole region; e(s) is smooth 
within each sub-region but has distinct jumps on the boundaries. As we may see from Figure 
1 , one particular issue is that the generating processes, 7 ( 5 ) and e(s), could vary substan- 


s G TZg : inf |/3(s)|> A 


se7^„ 


(7) 





r(5) e(5) P(s) Pis) 



Figure 1: Sample SVCFs from TMGP prior. Top row: weak global signals with strong local 
signals; Bottom row: strong global signals with weak local signals. 

tially while results in similar sparse SVCFs. This phenomenon indicates weak identihability 
in global and local GPs in the TMGP prior model. To resolve this issue, we impose the 
certain constraint on the marginal variance of the local GP e(s). i.e., 6^. This implies that 
we assume that the sampled SVCF is constructed from strong global signals and weak local 
signals. 

3 Theoretical results 

We hrst introduce two sets of extra conditions in addition to the conditions (C1)-(C3) 
for the SVCFs. The design matrix X as dehned in (2) satishes: 

(XI) Let drain and dmax be the smallest and largest eigenvalues of then 0 < dmin < 

dmax ^ CXD. 

For the kernel correlation functions in our proposed TMGP priors, we introduce 

the following condition: 

(Kl) k{s,s') = nt _^Kj{\\s — s' II) for some nowhere zero, continuous, symmetric density 
function (up to a normalization constant) Kj dehned on M. 

(K2) k{s, •) has continuous partial derivates up to order 2p + 2 where P = [|] + 1- 
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Theorem 1. Consider an arbitrary SVCF /9°(s) G V, i.e. /3°(s) satisfies conditions (Cl)- 
(C3). Let Ao = A[/3°(s)] and suppose the partition number G < oo. //O < A < Ao,0 < 
< oo and the kernel function n satisfies (Kl), then the proposed prior 

given in (3)-(6) satisfies that 

n (||/3(s) — /3°(s)||oo< e) > 0 for all e > 0 

Theorem 1 demonstrates that the proposed TMGP prior assign positive measures to 
arbitrarily small neighborhoods of all elements within V, the family of spatially varying 
coefficient functions dehned in our model (1). This property is essential, especially for 
Bayesian nonparametric priors, since it is necessary for appropriate posterior behaviors and 
can not be guaranteed in many cases. 

Theorem 2. Suppose that our observed data satisfies y (Si) ~ N{Xf3^{si),a'^Im) indepen¬ 
dently following the notations in (2), with a known and a fixed p-dimensional SVCF 
f3^{s) = [/3°(s),... ,/3p(s)]'^, p <m, defined on IZ. Suppose /3°(s) satisfies conditions (Cl)- 
(C3), i.e. /3°(s) G P, with G < oo; X satisfies condition (XI); the kernel function in the 

2 2 j 

TMGP priors satisfies (Kl) and (K2). For all e > 0, if ^ each dimension of 

(3{s) follows a TMGP prior independently satisfying the conditions in Theorem 1, then the 
posterior distribution satisfies that 

^[Uf I ?/(si),...,l/(s„)] ^ 0, 

as n ^ oo in P^o probability, where Ue = {/3(s) G P : ||/3(s) — /3°(s)||i^oo< £}■ 

Theorem 2 justihes the posterior consistency of the proposed TMGP prior given model 
(1) under the inhll asymptotic framework. It implies that, if a ground truth of the SVGFs 
exists and the data is generated accordingly, then the posterior distribution of (3{s) can be 
concentrated to an arbitrarily small 11 - 111,00 neighborhood around the truth as the number of 
spatial locations goes to inhnity. The conditions of this theory also imply that a small ratio 
between the number of subjects and the variance of pure noise, i.e., cr^/m, is also important 
to guarantee a good performance of our method. One limitation of Theorem 2 is that it 
does not apply to the voxel level analysis where G = n —>■ oo. However, this type of analysis 
generally works well empirically. 
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Although the 11-111,00 norm is not common in Bayesian asymptotic literatures, a direct 
result based on Theorem 2 is the element-wise posterior consistency for f3{s) under the 
commonly used empirical H-Hi norm for a fixed design of spatial locations Sj (Ghosal et al., 
2006). 

Corollary 1. Under the same assumptions and conditions in Theorem 2, for all e > 0, if 
fm < then the posterior distribution satisfies that for all k = 1,... ,p, 

n [Ulk I ^ 0 

as n —>■ oo in P^o probability, where = {/5(s) G V : ||/3(s) — /3°(s)||i< e}. 

Of note, in neuroimaging studies, Sj are usually fixed 3D grid points, thus we do not 
consider the H-Hi norms with regard to random measures for s when proving posterior con¬ 
sistency. For a fixed design within a finite domain TZ (the volume of brain is limited), 
another useful direction is to show posterior consistency under the H-Hi norm with regard to 
the Lebesgue measure. We see this as a potential extension to the theory development in 
the future work. 

4 Posterior Inferences 

4.1 Model Representation 

Now consider the SVCM defined in (1). For the p-dimensional multivariate spatially varying 
coefficient function, /3(s) = [/di(s),...,/dfc(s)]'^, we assume that 

with k{-, ■) being a smooth kernel function. This specification implies that the global pro¬ 
cesses (5) have distinct flexible variance parameters r|, while the local fluctuation processes 
(6) have a small fixed marginal variance 6*^. 

Based on the prior specification for /3(s), we have that (3k{s) = (3k{s)Ix^[(3k{s)] for k = 
1 ,... ,p, where 



/^fc(s) = 7fc(s) + efc(s). 

(8) 

where 7 fc(s) 

~ QV[Q,tIk{s,s')] and efc(s) ~ QV (^0, s') x G Rg]^ 

. For 

global GPs: 

7 fc(s) in (8), its Karhunen-Loeve (KL) expansion can be expressed as 




(9) 


1=1 
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where uu ~ iV(0, t|C/) independently with 0 > 0 such that 

that / (pi{s)ipii{s)ds = 0,V/ 7 ^ V based on the Mercer’s Theorem (Williams and Rasmussen, 
2006). In practice, we truncate the inhnite sum in (9) into L terms such that 0 

is close to 1. 

This implies a model representation of the proposed SVCM along with the TMGP prior 
specihcations. To be more specific, the neuroimaging signal yj(s) on locations si,...,s„ 
can be modeled through latent mGPs: /3(s) = [/?i(s),...,/3p(s)]"'" and the truncated KL 
expansion coefficients with = [wfci,..., WfeL]"'' by integrating out the local GPs 

efc(s) in (8), which is given by 



[yY^i] 

1 1 /3(Si),CT^ 

] ~ N (^a;J^A[/3(si)], Yj , 

(10) 




Uk] ~ N [ifgUk, 6‘^Kg) , 

(11) 



Ukl 

-NiyQrl), 

(12) 

for j = 1,.. 

., m, i = 1,..., n. 

A; = 1,... 

,p, = 1,..., G and / = 1,. 

.., L, where a^) 


represents a normal distribution with mean p and variance ipg = {c^( with 

ip{si) = [(pi(si),... ,(pi(si)]'^ and Kg = Si')}s„s,,e 7 e 9 is a correlation matrix. The p- 

dimensional vector value functional operator g\{-) is dehned on the domain of all functions 
in V, which is given by 


£/a[/3(s)] 


T 


Pi{s)Ix, [A(s)], ■ ■ ■, /?p(s)/aJ/?p(s)] 
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with A = [Ai,..., Ap]"'". 


4.2 Hyper Prior Specifications 


We assign conjugate priors to variance parameters cr^ and r| in the TMGP model, i.e. 
cr^ ~ Inv-Ga(n,tc) and r| ~ Inv-Ga(n, tc), where Inv-Ga(f,tc) represents an inverse gamma 
distribution with shape v and rate w. We fix = 1 to restrict local deviations in order to 
sort out the weakly identihability in the model, especially for the case that a small number 
of observations are recorded in each region. 

We develop a data-driven method to specify the prior of thresholding parameters A^, k = 
1,... ,p. We consider the log full conditional of A^ which is given by 

log 7r[Afc I y, /3fc, (3-k, 0 -^] = ^[Afc; 3^, /3fc, /3-fc]/cr^ + G, 
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where C is a constant, Pk = [/3fc(si),...,/3fe(s„)]'^, f3k = [/^^(si), • • •,/^fc(Sn)]'^, f^-k = 
[/3i,..., (3k-i, (3k+i, /3p]^, and 

n 

i(\k) - i\^k\y,0k,0-k] = J]]mt(s.)/A,[/3t(s.)i, 


(13) 


i=l 


with Uk{s) = Yl%i^k{s)xjk 2yj_k{s) - Pk{s)xjk and yj-k{s) = yj{s) - ZljVfc 
The function t{\k) is flat when is around zero and dramatically decreases when is 
greater a certain value, to which we refer as a “turning point”. It should be close to the true 
threshold. Figure 2 shows the prohles of ^{\k) for a model with three SVCFs on a space of 
900 locations from 50 simulated datasets. The true thresholds A^ = + 1 for A; = 1, 2, 3. 

The turning points in the prohles of i.{\k) are all around the true thresholds. Thus, we can 
specify the priors of \k according to In practice, we need to provide rough estimates 

of /3fc and f3-k in order to evaluate f'(Afc) before posterior inferences. We consider an SVCM 
with smoothed SVCFs approximated by the truncated K-L expansion, where we compute 
the ordinary least squares (OLS) of the coefficients, i.e. 

m n / p L 

{wik}f=il=i = arg min EEl yji^i) - ^jk^li^^ij'Wlk 


j=l i=l 


k=l 1=1 


Then both /3fc(s) and /3fc(s) can be approximated by /3fc(s) = Thus, we 

replace f3k and /3_fc by ^k = [3fc(si), • • •, Ac(sn)]’^ and = [3i, • • ■,Pk-i,Pk+i, ■ ■ ■,PpV, 
respectively, in (13). Write i{Xk) = i{Xk',y,l3k,P-k)- 

We propose to assign uniform priors to Xk, i.e. Xk ~ Unif(cfc — hk, Ck + hk), where the half 
range hk and center Ck can be determined based on the prohle of (.{Xk). More specihcally, 
we evaluate t{Xk) on a set of grid points {A[,^\ ..., denoted , ■ ■ ■, Given an 


interval (a, 6 ), dehne the sample correlation between Xk and i{Xk) within (a, 6 ) as 

p(a,6) = 


with Xk = Y1 


Xye{a,b) 


^kyyY.x(3)(z(a,b)^^P 

Xa) IIX 1 . - ^ . ffia) 


yp/M{a,b), 4 = E 4 .),(,,,) 4 " 7 M(a, 6 ) and M(a,&) = ^ 


(a, b)]. And dehne 

Ckih) = min{Ai,®^ : {^X^ - h, X^ + h)\> (k}, 

where 4 is determined by the rejection region of Pearson correlation test. Given h > 0, Ck{h) 
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Figure 2 : Simulated ^(A^) from 50 synthetic datasets: ground truth (Ai = 2, A 2 = 3, A 3 = 4) 
are marked in the hgures. 


represents the location that £{Xk) and Xk have no signihcant correlation. Then we specify 
hk = min{h : \p{ck{h) - h,Ck{h) + h)\> and Ck = Ck{hk). 

This leads to an informative prior range [hk — Ck, hk + Ck] for Xk with a high probability to 
cover the turning point of £{Xk). 

4.3 Kernel Expansion for Massive Data Analysis 

Theoretically, the KL expansion for the GP with kernel function relies on solving the 

integral equation f k{s, s')ip{s)ds = (iip(^s'), which might not admit analytical solutions. 
Empirically, the expansion is often achieved by calculating the eigenvalues and eigenvectors 
of the n xn correlation matrix Kn = on a set of pre-specihed locations. 

However, in the analysis of massive neuroimaging data that can involve a very large number 
{n can be hundreds of thousands) of brain locations, it is computationally infeasible to 
perform eigen decompositions on Kn- To solve this issue, we introduce the modihed square 
exponential kernel 

k(s, s') = exp{—a||s|| 2 —a||s'|| 2 — 6||s — s' 112 }, a, 6 > 0 (14) 

with a relatively small value for a as a numerical approximation to the square exponential 
kernel when dealing with massive neuroimaging data. The major beneht of this kernel 
function is that it has analytically tractable expansion. The detailed properties of this 
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kernel is summarized in Proposition 1. 


Proposition 1. For a specific I G {l,...,oo}, define series {ki}f^Q, ^■5 

follows 




ki + d — i — 1 
d — i 


< L < 


ki + d — i 
d — i 


— 1>, 0 < i < d — 1, kd = 0, 


Iq I Ij li—l 


ki_i + d-i 


, ^ > 1 , 


d — i + 1 

rrii = ki_i - ki, i> 1, 

where is the set of nonnegative integers; = 0 if k > n. Define L = = 

^ ~ ['^1’ •••’ ^ Fi{^) Cl = 1) •••) oo) be the eigenfunctions 

and eigenvalues for the modified sguare exponential kernel k{s,s') as defined in (14), then 

^k + d — D 
d — 1 




s'"", = {1 - B)'‘Y, 

1 ^ 1=1 V 


k=0 




ipi{s) = {2cyi exp {-c\\s\\l)Y\_HmfiV^Si), 

i=l 

where c = -v/a^ + 2ab, A = a + b + c and B = h/A; Hk{-) is the kth (k G order normalized 
hermit polynomial, which is defined by Hk{x) = {2^k\ exp[x‘^)^ exp[—x‘^). 


4.4 A Markov chain Monte Carlo Algorithm 


We developed a generally efficient MCMC sampling algorithm for posterior inference about 
[{/3(sj)}”^j^, (T^, A | 3^] based on the representation and approximation 

for our model with the TMGP priors (10)-(12). 

Updating /3(si), i = 1,..., n, is an essential step in the MCMC algorithm. The Metropolis- 
Hasting (M-H) algorithm is employed with a block updating scheme separately for si&iig) 

g = 1,... ,G to facilitate efficient chain mixing. Under the scenario where region partition 
structure T^i,..., IZg is available and reliable, we can directly use this partition information. 
For voxel level analysis or analysis where no prior knowledge about the regional information 
are adopted, we hrst £t voxel-wise CLMs and then use certain clustering algorithms to clus¬ 
ter the resulting spatially varying coefficient values. This initial clustering results for the 
brain locations (usually centers of voxels) are used for block updating. Another M-H step in 
our algorithm is updating A^, A; = 1,... ,p, the thresholding parameters in the TMGP priors. 
The remaining parameters {uk, k = 1,... ,p) and hyperparameters (a^, r|, k = 1,... ,p) are 
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updated by directly drawing samples from their full conditionals due to conjugacy. More 
details about the MCMC algorithm are available at the appendix. 

4.5 Posterior Inference on SVCFs 


With the recorded MCMC samples and f = 1,... ,T, we can achieve three 

major goals: 1) selecting neuroimaging features; 2) estimating covariate effects at the feature 
regions; 3) making prediction on the covariate effects at any brain location. 

To select the important imaging features at the regional level, we estimate the selection 
probability of every region, g = 1,... ,G, according to the dehnition (C1)-(C3), using the 
MCMC samples as 


P{g eh) = p 


inf \(3k{si 

l<i<n,Si£7Zg 




1 

T 




t=i 


inf 


then we estimate (3{si) as follows, if Sj G TZg, 




(15) 


E[Pk{si) I mfi<j<ri,sjeng\Pk{sj)\> Xk,y], P{g e h) > q 
0, P{g e Ji) < g 

for all fc = 1, ...,p, where 0.5 < g < 1 is a threshold for the posterior probabilities of being 
nonzero at certain brain locations. We use g = 0.90 throughout the rest of our analysis. 
Estimates for the posterior conditional expectations in (15) can be easily calculated based 
on the posterior samples. 

As a special case, to conduct voxel level selection (i.e., each voxel is a region with voxel 
centers being Si,..., s„), we can simply adapt (15) to 


f^ki^i) 


\ E[hY^) 

1 |/3a:(sOI> Afc,3^], P{\hYi)\>\k 

\y)>(i 

jo, 

HhY^)\>\k 

\y)<(i 




(16) 


can be regarded as the posterior 


where P{\^k{si)\> A^ | 3^) ^ Et=i ^ (si)l> A), 

probability of activation for each voxel. 

Making predication on /3(-) at an arbitrary new brain location sq G 7^ with the posterior 
samples is also available. Without loss of generality, suppose Sq ^ Pg, then 

P0k{so) I ^) = y P0k{so) I ^l,y) X I y)d^l 
where p(-) represents the probability densities; /3^ is actually {/3fc(sj)}s-g7^g; [/3fc(so) I /3fc, 3^] = 
[/3fc(so) I /3^] ~ N{(p{soyuk + kg^soY- (pgU^), - B‘^kg{syYKYkg{sy)) with 
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kgiso) = {k,{sq, This indicates that we can predict (3k{-) at an unobserved 

location sq as 


Pk{so) 


? + kgisoVKg TZg is selected 

0 , TZg is not selected 


(17) 

where is the corresponding vector for /3^ based on the tth recorded MCMC sample. 

5 Numerical Examples 
5.1 Simulation Study: Synthetic 2D Imaging Data 


For demonstration purpose, we consider a 2D case, i.e. TZ C [0,1]^ in this simulation, three 
covariate functions, (3i{s), (32{s) and are created on 71 as shown in Figure 3 (the “true” 

column). We consider n = 30 x 30, 50 x 50 and 60 x 60 locations within the hxed domain 
TZ. The data is generated from 

yj{si) = I3i{si)xji + /32{si)xj2 + /33{si)xj3 + ej{si), (18) 

where ej(sj) ~ N{0,a‘^) and Xji ~ iV(0,4),a;j2 ~ Unif(—1,1),Xjs ~ Bernoulli(0.5). We 
considered two sample sizes m = 50 and m = 100 in combination with two different noise 
levels cr^ = 2 and cr^ = 4. Given one combination {n, m, a^}, 50 datasets are independently 
generated. The MCMC algorithm is implemented to fit the SVCM model and select infor¬ 
mative features for each dataset. We choose the SE kernel for the TMGP priors. The spatial 
range parameter, b, was hxed as 30. The priors for the thresholding parameters were hxed 
as Unif(0.3,1.25). All pixels are divided into four subsets for block updating according to 
/c-means clustering results. The MCMC iterations are implemented 10, 000 times with the 
hrst 5,000 samples discarded as burn-in. For each simulated dataset, the algorithm usually 
takes less than 1 minute to complete the 10,000 iterations on a standard Intel i7 quad core 
desktop PC. 

We compare our results to the standard voxelwise GLM methods. The features estimated 
from the GLM method are thresholded based on the p-values for testing whether Pk{si) = 
0. We considere both the direct thresholding based on naive f-test (GLM-f), thresholding 
using the FDR adjusted p-values (GLM-FDR) (Benjamini and Yekutieli, 2001; Benjamini 
et ah, 2006) and thresholding by controlling FWER based on standard random held theory 


17 


(GLM-RFT) (Nichols and Hayasaka, 2003). Figure 3 presents the estimated covariate effect 
functions from GLM-t, GLM-FDR, GLM-RFT and our method based on one simulated 
dataset from our experiments. Figure 3 shows that our method provides more accurate 
feature selection results by eliminating most false positive signals as well as maintaining 
high sensitivity. Our estimates also recover the features more accurately by incorporating 
spatial smoothness. In addition, our posterior inference procedures can provide, for each 
pixel in the images, the probability of presenting features that are different from zero, as 
shown in the last column in Figure 3. For all the scenarios, we report in Table 1 the relative 
mean square errors with regard to the GLM estimates, which is defined as 

Er=iELi 


ReMSE = 


2 ’ 


(19) 


where /3k{si) are the estimates from a certain method, /3l{si) are the voxel-wise GLM es¬ 
timates without any thresholding and Pki^i) represent the true values. We also reported 
the false discovery rates (FDRs) and the false negative rates (FNRs) in Table 1, which are 
specified as: 

Er=i ELi i0k{si) ^ 0] X i[[3k{si) = 0] 


FDR = 


FNR = 


Er=i ELi I[Pk{si) = 0] X I[/3k{si) ^ 0] 


( 20 ) 


( 21 ) 


Et.ELjms,) ^ 0] 

Based on the results from Table 1, our method performs well consistently in terms of both 
feature selection (small FDRs and FNRs) as well as estimation (small ReMSE), especially 
when the noise level is high or the number of subjects is small. Random field theory based 
thresholding performs well at low noise level but deteriorates notably as noise level increases 
due to low sensitivity. FDR control is also relatively robust but this method consistently 
generates false positive signals. The performance of our SVGM-TMGP method also increases 
as the number of spatial locations increases within a fixed domain, which agrees with our 
posterior consistency theory based on infill asymptotics. 

To comprehensively compare the performance of TMGP priors in feature selection with 
other common methods, we also conduct the receiver operating characteristic (ROG) analy¬ 
sis. Since our original method will automatically generate the optimal thresholding values, in 
this ROG analysis, we fix A at different values and rerun the MGMG simulation to alternate 
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Figure 3: Column 1-5: true and estimated spatial covariate effects from GLM-t, GLM-FDR, 
GLM-RFT and SVGM-TMGP; Golumn 6: the selection probability estimated from SVGM- 
TMGP. The result is generated from one simulated dataset with m = 50 subjects, n = 3600 
pixels and noise level = 4. 


the specihcities. Figure 4 shows the ROG curves of our method, GLM-FDR and GLM-RFT 
with = 4 and m = 50 (n = 900, 2500 and 3600, respectively). To quantify the differences, 
we calculate the area under ROG curves (Table 2) when the false positive rates are smaller 
than 0.1 for all three methods. Figure 4 and Table 2 show that, under all these three set¬ 
tings, our method (SVGM-TMGP) achieve the best performance; as n increases, our method 
proves to have improved performance. The random held correction based on FWER control 
(GLM-RFT) suffers from serious false negative problems, leading to low statistical powers. 
The FDR control (GLM-FDR) is a competitive alternative to our method, especially when 
the number of spatial locations, n, is relatively small. 

To demonstrate the Bayesian learning of the thresholding parameters, we present the 
histograms for our recorded MGMG samples along with the trace plots for the whole Markov 
chain from one simulated dataset. It is clear that the marginal posterior distributions of the 
all thresholding parameters are different from the same prior Unif(0.3,1, 25). This indicates 
our model can achieve Bayesian learning (Xie and Garlin, 2004) of all the thresholding 
parameters. 
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Table 1: Quantitative comparison of SVCM-TMGP to voxel-wise GLM fitting resnlts with 
various thresholdings. All results reported are the means and standard errors based on 50 
independently simnlated datasets. 




ReMSE 

FDR(%) 

FNR(%) 

ReMSE 

FDR(%) 

FNR(%) 

n=900 



= 2, m = 

50) 


= 4, TO = 

50) 


GLM-t 

0.39(0.12) 

18.3(4.3) 

O.O(O.O) 

0.40(0.23) 

17.5(5.3) 

1.5(0.2) 


GLM-FDR 

0.24(0.07) 

5.0(1.0) 

O.O(O.O) 

0.32(0.08) 

5.4(1.5) 

4.1(0.6) 


GLM-RFT 

0.36(0.17) 

O.O(O.O) 

4.4(1.5) 

0.80(0.29) 

O.O(O.O) 

22.7(5.6) 


SVGM-TMGP 

0.19(0.08) 

0.9(0.4) 

0.8(0.2) 

0.12(0.02) 

2.8(0.9) 

0.7(0.2) 

n=900 


(ct2 : 

= 2, TO = 

100) 


= 4, TO = 

100) 


GLM-t 

0.41(0.13) 

18.9(3.5) 

O.O(O.O) 

0.39(0.11) 

19.2(4.9) 

O.O(O.O) 


GLM-FDR 

0.25(0.06) 

5.2(0.7) 

O.O(O.O) 

0.22(0.06) 

5.1(1.2) 

O.O(O.O) 


GLM-RFT 

0.17(0.10) 

0.2(0.0) 

O.O(O.O) 

0.31(0.14) 

0.5(0.0) 

4.6(1.3) 


SVGM-TMGP 

0.20(0.08) 

0.2(0.0) 

0.2(0.0) 

0.19(0.08) 

1.3(0.4) 

0.6(0.2) 

n=2500 



= 2, TO = 

50) 


= 4, TO = 

50) 


GLM-t 

0.32(0.10) 

28.1(5.3) 

O.O(O.O) 

0.33(0.16) 

28.8(6.5) 

0.4(0.1) 


GLM-FDR 

0.13(0.02) 

2.9(1.0) 

0.3(0.1) 

0.20(0.08) 

4.3(1.6) 

5.0(1.6) 


GLM-RFT 

0.27(0.14) 

0.3(0.0) 

6.3(2.1) 

0.61(0.29) 

0.4(0.2) 

27.9(4.5) 


SVGM-TMGP 

0.12(0.06) 

0.2(0.0) 

0.6(0.1) 

0.08(0.03) 

1.2(0.4) 

0.9(0.4) 

n=2500 


(a^ ■ 

= 2, TO = 

100) 


= 4, TO = 

100) 


GLM-t 

0.34(0.09) 

29.4(4.9) 

O.O(O.O) 

0.35(0.12) 

29.2(5.0) 

0.0(0.2) 


GLM-FDR 

0.19(0.06) 

5.4(0.8) 

O.O(O.O) 

0.14(0.03) 

4.6(1.3) 

O.l(O.O) 


GLM-RFT 

0.10(0.05) 

O.l(O.O) 

O.l(O.O) 

0.30(0.14) 

0.3(0.0) 

7.0(2.3) 


SVGM-TMGP 

0.15(0.04) 

O.l(O.O) 

0.3(0.0) 

0.13(0.05) 

0.4(0.1) 

0.6(0.1) 

n=3600 



= 2, TO = 

50) 


= 4, TO = 

50) 


GLM-t 

0.33(0.09) 

35.2(6.2) 

O.O(O.O) 

0.31(0.11) 

31.5(7.3) 

0.7(0.4) 


GLM-FDR 

0.12(0.03) 

3.8(1.0) 

0.2(0.0) 

0.17(0.04) 

3.8(1.3) 

5.2(1.4) 


GLM-RFT 

0.25(0.07) 

0.2(0.0) 

7.6(2.2) 

0.48(0.10) 

0.3(0.1) 

28.7(4.6) 


SVGM-TMGP 

0.10(0.03) 

O.l(O.O) 

O.O(O.O) 

0.07(0.02) 

1.1(0.3) 

1.8(0.6) 

n=3600 


(ct^ : 

= 2, TO = 

100) 


= 4, TO = 

100) 


GLM-t 

0.33(0.06) 

34.4(5.1) 

O.O(O.O) 

0.33(0.09) 

33.5(6.0) 

O.O(O.O) 


GLM-FDR 

0.18(0.02) 

4.3(0.5) 

O.O(O.O) 

0.13(0.03) 

2.6(0.7) 

O.O(O.O) 


GLM-RFT 

0.09(0.06) 

O.O(O.O) 

0.2(0.5) 

0.24(0.07) 

O.l(O.O) 

6.5(1.3) 


SVGM-TMGP 

0.12(0.03) 

O.O(O.O) 

0.5(0.0) 

0.11(0.03) 

0.2(0.0) 

0.5(0.3) 


Table 2: Area nnder the ROG curves (AUG) with false positive rates are within [0,0.1] 





AUC 

xlO-i 




n = 

= 900 

n = 

2500 

n = 

3600 

GLM-FDR 

0.978 

(0.010) 

0.976 

(0.014) 

0.979 

(0.005) 

GLM-RFT 

0.912 

(0.049) 

0.915 

(0.080) 

0.911 

(0.072) 

SVGM-TMGP 

0.982 

(0.012) 

0.989 

(0.007) 

0.998 

(0.001) 


5.2 Data Application: The Autism Brain Imaging Data Exchange (ABIDE) 

We apply onr method to the data from ABIDE, which is a consortium collecting and sharing 

resting-state fMRI data from 1,112 subjects. Govariate information snch as age at scan, sex, 
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Sensitivity 


Sensitivity 


Sensitivity 





Figure 4: ROC analysis results from 50 replicated datasets. Three competing methods are 
TMGP-SVCM, GLM-FDR, GLM-RFT. The variance are all cx^ = 4; subject numbers are 
all m = 50; the number of spatial locations are n = 900,2500 and 3600. 



(A) Histograms for posterior samples of A 


. 

6000 8000 10000 12000 
Number of MCMC iterations 

(B) Traceplot for posterior samples of A 



Figure 5: Histoplots of A posterior samples and traceplots for the related Markov chain. 
Results are generated from one simulation dataset with n = 3600, cr^ = 4, m = 50. Initial 
prior specihcations for Ai, A 2 and A 3 are all Unif(0.3,1.25). 


IQ, handedness and diagnostic information are also available from ABIDE studies. Among 
the subjects, 539 individuals have Autism spectrum disorders (ASD), which are character¬ 
ized by symptoms such as social difficulties, communication deficits, stereotyped behaviors 
and cognitive delays. The remaining subjects are the age-matched normal controls (NG). All 
the fMRI images are preprocessed through slice-timing, motion correction, nuisance signal 
regression and temporal filtering. The resulting fMRI data, which are 91 x 109 x 91 3D matri¬ 
ces, are normalized and registered to Montreal Neurological Institute (MNI) 152 stereotactic 
space. We aim to investigate the voxel-wise measures of latent functional architecture of 
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A5: age X ASD group gender x ASD group 


Figure 6: i{Xk) for specifying A priors in the analysis of ABIDE data. The colored shades 
mark the intervals we choose as the range of the uniform priors for A. 

the brains through fractional amplitude of low-frequency fluctuations (fALFF)(Zou et ah, 
2008). fALFF is a metric reflecting the percentage of power spectrum within low-frequency 
domain (0.01 — O.lHz) which characterizes the intensity of spontaneous brain activities. We 
calculate the fALFF for each subject at every voxel. Since the fALFF is restricted to (0,1), 
we perform the following monotone transformation 

%•(«) = log , (22) 

where fj{s) represents the fALFF for subject j = 1,, 1112 at brain location s and treat 
the transformed data as our outcomes. 

The covariates we choose for htting model (1) are [1, group, age, gender, groupx age, 
groupX gender]. We use all the voxels at the gray matter as the observed spatial locations 
Si,... ,Sn and all the anatomical parcellation based on MNI templates as our brain regions 
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(A) Covariate effects for the ASD group versus the control 



(B) Covariate effects for the age 



(C) Covariate effects for group and gender interaction 


Figure 7: Estimated SVCFs (top row in each subplot) and regional selection probabilities 
(bottom row in each subplot) based on posterior samples from our MCMC algorithm for 
“ASD group”, “age” and “ASD groupx gender” 
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TZi,... ,71 g {n = 177,743 and G = 116). The imaging outcomes are centered across all 
subjects at each voxel. The group variable equals to 1 for the ASD subjects; the ages are 
all centered and scaled with zero mean and unit variance; the gender variable equals to 
1 for female subjects. The priors for the thresholding parameters are determined through 
the method described in subsection 4.2. The prohles of ^{\k) are shown in Figure 6. The 
priors for the six thresholding parameters are Unif(0, 0.03), Unif(0.05, 0.15), Unif(0.03, 0.13), 
Unif(0.1, 0.25), Unif(0.02, 0.06) and Unif(0.05, 0.2) according to the plots. The Gaussian 
kernel we use is the modihed square exponential kernel with a = 0.25, 6 = 95 (bandwidth 
hxed without updating). To achieve 90% recovery rate of the KL expansion, we set T = 1,140 
eigenfunctions. The MCMC algorithm runs 60, 000 iterations with 25, 000 burn-in. 

Based on our results, the ASD subjects tend to show lower fALFF outcomes at the 
median and superior part of the right occipital lobe, which is the visual processing centers 
of human brains (the visual cortex). We observe signihcantly higher activities at the right 
fusiform gyrus, which has been reported to be related to Autism in (Hadjikhani et ah, 2004). 
Similar hndings are observed at the right median orbitofrontal cortex, the region involved in 
most human cognition processes, especially decision-making, indicating more spontaneous 
brain cognition activities among the ASD subjects. From the axial view. Figure 7(A) shows 
the information discussed above. Some other regions that are selected includes the right 
thalamus and the right anterior cingulum, which we do not discuss here in detail. 

Another major hndings are the age effect on the fALFFs. We identihed three brain 
regions that show higher fALFF outcomes as the age increases: the median occipital lobe, 
the median temporal lobe and the angular gyrus. These regions are generally involved in 
brain functions such as spatial temporal cognition, language, memory, attention and visual 
processing. Figure 7(B) shows the hndings above in brain slices from the axial view. 

Although no specihc regions of interest are observed for the “gender” variable, certain 
brain regions demonstrate diherent activation patterns its interaction with the disease group. 
Specihcally, female ASD subjects have higher fALFFs as compared with male ASD ehects 
at the left median and superior part of the orbital gyrus but lower fALFFs at the left frontal 
lobe gyrus and the left rectus. Figure 7(C) shows these hndings in three views for the ease 
of demonstration. Beyond these hndings, we also note that the right inferior temporal gyrus 
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displays smaller effects among the female ASD subjects as compared with the male autistics. 

6 Discussion 

In this paper, we introduced a new family of prior, the TMGP prior, for feature selections 
within spatially varying coefficient functions and its applications to massive neuroimaging 
data analysis. We demonstrate the prior large support properties of the TMGP prior and its 
posterior consistency under the spatially varying coefficient models under the spatial inhll 
asymptotics. Simulation studies show that the TMGP prior is especially useful for imaging 
feature selections with relatively large noise or small sample sizes. 

In most spatial statistics literatures such as Diggle et ah (1998); Gelfand et al. (2003); 
Smith et al. (2002), a spatial process are decomposed into three parts: a deterministic trend 
process or, in other words, mean process; a zero-mean variance process with continuous 
sample path and a zero-mean white noise process, i.e., the nugget effect. Under this general 
framework, Zhu et al. (2014) considered a more complex model as compared with our model 
(1), which could be expressed using our notations as 

%(s) = *J/3(s) + Vjis) + ej{s), (23) 

for all subjects j = They estimated the additional term 'r]j{s) for every subject 

through standard local linear regression techniques, which do not require a pre-specihed 
spatial covariance structure or equivalently, its Karhunen-Loeve basis. Since in this 
paper, our primary focus is on the GLM framework for imaging data, we did not apply 
our TMGP prior under the setting of model (23). To enable a similar analysis using the 
TMGP prior for f3{s) in (23) under the Bayesian framework, there are four major tasks. 
First, a proper prior specihcation for rij{s) needs to be introduced which should be flexible 
enough to capture various spatially smooth dynamics. Second, a computationally efficient 
algorithm for estimating the additional parameters brought by fjjis) is required since this 
set of parameters scale with the number of subjects. Third, since for each subject, we will 
have a subject specihc random effect term, we need to carefully monitor the model htting 
procedures to avoid potential over-htting issues. Fourth, the theoretical analysis for posterior 
consistency in Theorem 2 needs to be adapted to the more challenging model structure. 

In addition to applying the TMGP prior to model (23), our study can be extended to 
some other directions. With a focus on neuroimaging studies using model (1), we can extend 
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the prior constructions to enable self-guided parcellation while conducting feature selection 
tasks. This can help relax the region based sparsity assumptions for the SVCFs. We can 
also explore the Bayesian asymptotic theories when the number of brain region partitions 
diverges. With a focus on general Bayesian analysis, we can extend TMGP for modeling high¬ 
dimensional multivariate binary processes or selecting features for scalar-on-image models 
such as neuropsychiatric disease predictions. 


Appendices 


A Proof of Theorem 1 

Based on the assumptions for /9°(s), let A = and Iq = {1,..., G}\/i, we have 

that 

n(||/3(s)-/30(s)|U<£)> 

n( sup |/3(s)-/3°(s)|< £, inf |/3(s)|>A, sup |/3(s)|< A 

YseU96/^7^9 seU36/g7^9 

(24) 

Without loss of generality, we only consider 0 < £ < Aq —A. Note that for all s G Ugg/^T^g, 
|/3(s) —/?°(s)|< e and |/9°(s)|> Aq implies that |/3(s)|> Aq —£ > A, then (24) is equivalent to 

n (||/3(s)-/3°(s)||oo< e) > n ( sup |/3(s)-/3°(s)|< £, sup |/3(s)|<A 

y^SUgg/j7?.g SeUggi(j7?.g 

Let 4’i{s) and = l,...,cx), be the normalized eigenfunctions and eigenvalues of the 
kernel function «:(•, •), then the Karhunen-Loeve expansions of 7(5) and e(s) can be expressed 
as 7(5) = ^ and e(s) = ^ ® ^ '^g^9 = such that 

ui ~ A(0,(Cir^), vig ~ N{0Xid'^) and ui,vig are all independent. Since the RKHS of k(-, •) 
is C(7l), for s within any partition TZg, can be represented as ) where 

E OO 2 ^ 

l=lWtg < 00 • 

For s G TZg with g E Ii 

sup |/3(s) - /?°(s)|< sup \h,g{s) - /?°,g(s)|+ sup |/32_g(s)|+ sup |/?i)g(s)|, (25) 

S^T^g S^T^g 

where ^L,g{s) = = Uti^igM^), h,gi.s) = - h,g{.s) 
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and /5°*g(s) = /3°(s) — /3^^g{s). Since the RKHS of k{-,-) is C{71), P{s) is nniformly con- 
tinnons on TZg with probability 1, then by Theorem 3.1.2 of Adler and Taylor (2009), 
lim^^^oosnpgg7^^|/32g(s)|= 0 with probability 1. By the nniform convergence of the se- 
™s to /^°(«) as L -)■ cx) on Tig, lim^^oosnpgg7^J/3^*g(s)|= 0. Then we 

can hnd a hnite integer Lg snch that for all L > Lg, snpgg7^^|/32^g(s)|< | with probabil¬ 
ity 1 and snpgg7^^|/3j(*g(s)|< |. Since 0;(s),/ = l,...,Lg are all continnous fnnctions in 
on 71, we have that maxi<Kig||0i(s)||oo< where is a certain constant. Let 

\ui -l- vig — wig\< 3 ^ for all I = 1, ...,Lg and consider L = Lg in (25), we have that 

suPse7^9l/3L,g(s)-/5L,g(s)|< |- Thns, the condition \ui + vig - wig\< can 

gnarantee that snpgg^^^|/3(s) — /9°(s)|< e with probability 1 for G h. 

For s G Tig with g G Iq, similar to (25) and the dehnitions above, we have 

snp |/3(s)|< snp \PL,g{s)\+ snp \P^g{s)\. (26) 

sGT^g sGT^g sGT^g 

Similarly, we can find Lg and snch that \ui + vig\< 2 l ^ ~ gnarantees 

that snpgg 7 ^^|/d(s)|< A with probability 1 for all g G Iq. 

Then we have 

n (||/3(s) -/3°(s)||oo< s) > + nzg - w;ig|< : I = l,...,Lg,g G Jij U 

I \ui + Vig\< 2LgM^^Lg Lg, G /o| ^ >0, (27) 

dne to the positive measnres assigned on arbitrary nonempty sets by the + -^maxj- 

dimensional mnltivariate Ganssian distribntion: {ui, ..., vl^i, ..., Vig, ■■■, vlgg), 

where L^ax = maXg=i,.„,G Lg. 

B Proof of Theorem 2 

B.l KL neighborhood conditions for noniid outcomes 


Lemma 1. Consider our observed data y{si) G M”^, y{si) ~ fi,(i{y) where 
Mv) = (27ra^)“”^/^exp i^-^\\y - X(3{si)f^ 


for some constant > 0. Define 

A(/3o, /3) = log 4^, K,{f3o, (3) = [A(/3o, f3)], VM, f3) = 

Ji,f3 

If we a assign an independent TMGP prior for each dimension of (3, 

h{s)^TMgv{rl,el,\u,K{;-)), 


i.e., 
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then we have that 3B,Il{B) > 0 such that 


liminf n \ \ (3 & B \ /3) < e > ) >0, 

n-s-oo \ n / 


i=l 




5^V'(/3“,/3)^0,V/3eB. 


2=1 


Proof. It is trivial to have that 

1 


A(/3o,/3) = ^(/3 (si) -/3(si)) 




X'y--XX'if3^is,)+i3is,)) 


then since = X(3{si), = a^I^, 

A'i(/3„,/3) = f^{l3«(s.)-l3(s.)VX^X{l3«(s.)-l3(s.)) < "^\\l3°(s,) - I3{s,)\\l 

1 TY) (i 

VM,f3) = -^{(3\s,)-(3{s,)yx^X{(3\s,)-(3{s,)) < -^\\(3\s,) - (3{sM 






Now consider Bk = |/5fc(s) : ||/3fe(s) - /3°(s)i|oo< and let B = r\\^^Bk. Since the 

priors for (3k{s)^k = are independent and 11(1?*.) > 0 dne to Theorem (1), n(i?) = 

nLi n(i?.) > 0. 

For all/3 G B, Ki{f3o, f3) < I]Lill/^fc(®)-/^fc(^)llL< £ and similarly 14(/3o,/3) < ‘2e. 
Then liminf^^oo u {{f3 e B : K,{f3^, /3) < e}) = n(i?) > 0 and ^ Er=i /3) < 

— —>■ 0 for all d E B. □ 

n ' 

B.2 Sieve constructions 

Define the set of fnnctions 

Vn=\ /3(s) E V : ||/?(s)||oo< \/n, snp |D"/?(s)| < g E /i[/?(s)], 1 < ||q;||i< p > , 

SST^g J 

as onr sieve constrnction. 

Lemma 2. If G < oo, the e-covering number under the sup-norm for Vn satisfies 


\ogN{e,Vn, IMIoo) < Cn^PE 


for some finite constant C. 


Proof. Dehne 


Vn^g = < /^(s) e CPiflZg) : snp |D“/?(s)|< Vn, 0 < ||a||i< p > , 

seTZg 

for all (7 = 1,..., G. Theorem 2.7.1 of Van Der Vaart and Wellner (1996) implies that 

\ogN{e,Vn,g, IMIoo) < Ggn^pe~'^, 









for some constants Cg < oo. Then by the definition of Vn, we have that 

G 

N{e,Vn, IMIoo) < WN{e,Vn,g, IMIoo) < exp , 

9=1 

where C = Cg < oc. □ 

Lemma 3. Consider the TMGP prior for f3{s) with kernel function satisfying condition 
(K1 )(K2), then HifP fl Vff) < De~^ for some constant D, 6 > 0. 


Proof. The constrnction of the TMGP prior implies that 

n(Pn) > TT n I snp |z^“ 4 (s)|> o < 

3=1 





where /3g(s) ~ QV{0, ( 6 *^ + t‘^)k{s, s')) for all g = I, ■■■, C. By applying Theorem 5 of Ghosal 
et ah (2006), we have that 11 ^snpsg 7 ^^|Zi)"/ 3 g(s)|> ^/n,0 < ||q!||i< pj > 1 — Ae~^"' for some 
A, 6 > 0 , given that n{s, ■) has continnons partial derivatives of order 2p + 2 on the compact 
set TZ. Then we have that 11(7^ fl V^) < 1 — (1 — Ae~^'')'^ < where D = AC dne to 

the fact that (1 — x)'^ > 1 — Gx for all 0 < a; < 1 and G = 1 , 2 ,.... □ 

Now we define P„ = {/3(s) = [/3i(s),...,/3p(s)]''" : (dk{s) G Vn,k = l,...,p} here and 
below. Then we can easily get that 

N{e,Pn, IMIoo) < exp{Gpn^£“‘^}, (28) 

and if assign TMGP priors independently for all elements in f3{s) then 

n (P n P^) < Dpexp{-bn}. (29) 


B.3 Test Constructions 


Lemma 4. Consider y ~ N{Xf3, a'^Im), cl standard linear model with sample size m where 
y = [pi,...,pm]; X is an m X p design matrix satisfying assumption (XI). Consider the 
test function $ = / ^||/3 — / 3 °|| 2 > for testing Hq : (3 = (3^ versus Hi : (3 = /3^, where 
/ 3 ° G and G {/3 G : ||/3 —/ 3 °||oo> £}; /3 = X)~^Xy is the ordinary least square 
estimator. Then for m > we have that 

S ^min 

< exp{-G^mp}, Epjl - 4>] < expl-G^mp}, 
for some Qm > 0 depending on m, where Pq ond Pi represents the probability distributions 
under Hq and Hi. 
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rsj 


Proof. Note that for t = 0,1, under Ht, ||/3 —/3*||2dmin’^/o‘^ < ~ ~ 

Xp, then we have that 


Ep,m=Pj\\P 


Po\\l> X ) ^ 


xl 


> 


e'^pd^ 




4(7^ 


< exp 


E^d 

C- 


16(7^ 




(30) 


where the last inequality is simply due to the f act that -P(Xp > x) — exp{—tx}, VO < 

t < 1/2 by letting t = 1/4. Similarly, 


Ep,[l-^] = Pi{\\P-P%< 


^Vp 


< Pi 




< 




< n(ll/3-/3'll2>-^ + ll/3“-/3‘||2) 


< n(ii/3-/3'ii2> 


fVl 

2 

log 2 


Dehne 12^ = here and below. Notice that m > is equivalent to > 0, 


we complete the proof. 


□ 


Lemma 5. We consider uq locations Si,...,s„q and define Pi = [/3i(sj),...,/9p(sj)]''' (P^ and 
Pi can he defined accordingly). Suppose that we have observed the data yi = [ 2 /i(si),..., ym{si)Y 
generated from the SVCM, i.e. yi ~ N{XPi,a^Im). Consider testing Hq : Pi = P^,i = 
l,...,nQ versus Hi -. Pi = PjC = l,...,?7,o where WP] —/3/||oo> ^ for all i = 1,...,7t,o. 
Define ^i = I ^||/3i —/ 3 /|| 2 > with Pi = X)~^Xyi. Then for the test function 

4> = / (X]r=i xoe have that 

-Epo[4>] < exp{-Cno}, Ep^l - !>] < exp{-Cno} 

Proof. By the results from Lemma 4, Ep^^i] < and L^pjl — $*] < for all 

i = I ,..., pq. Then 

( no no \ 

^ 4 , _ ^ Bp. [4.1 >f- noe-"”-”' , (32) 

i=l i=l j 

( no \ f 

- *■) a Y s -Pi 52(1 - *■) - E ^[1 - *■! a y - 

i=l ^ ) \i=l i=l 

( 33 ) 
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By the Hoeffding inequality (Hoeffding, 1963), the right hand side of both (32) and (33) 
are bounded by exp | — p) | ^ _ 2 g-o,„mp ^ g_ That is, -Epo[$] < 


exp{—Crio}, -E'pjl — “h] < exp{—C po} where C = 


^]^_2g-nm>np)2 


□ 


Lemma 6. For two functions (3^ {s), (3^ [s) e V, if\\(3^{s)—f3^{s)\\i= Jgg^|/S°(s)—/3^(s)|P„(ds) > 
e, we have that P„(|/9°(s) — /3^(s)|> |) > d where Q < d <1 is a constant. That is, the set 
{s G {si,..., s„} : |/5°(s) — /9^(s)|> |} has uq > dn — 1 elements. 

Proof. Let S = {s eTZ : |/3°(s) — /9^(s)|> |}, then 
£ < [ |/9°(s) -/3^(s)|Pn(cis) 

= / \f3^s)-(3\s)K{ds)+ [ \f3%s) - f3\s)Kids) 

J s^S sG7^\(S 

< (M„ + Afi)P„ {\p\s) - ,3‘(s)l> I) + 

where Pn(7^) = 1; Mq = ||/S°(s)||oo and Mi = ||/?^(s)||oo are hnite constants due to absolute 
continuity. Thus P„ (|/3°(s) - /?^(s)|> |) > d by letting d = 2 (Mo+Mi) • ^ 

Lemma 7. There exists a test <h/ 3 y /30 for testing Hq : f3{s) = /3°(s) against Hi : f3{s) = 
(3^{s) where ||/3^(s) —/3°(s)||i^oo> ^ our proposed SVCM, such that 

Epo[^i 3 \po] < exp{-Cn}, Epjl - <h/ 3 i,/ 3 o] < expl-Gn}, 
for some constant C with Pq and Pi corresponding to the probability distributions under Hq 
and Hi. 

Proof. For two vector-valued functions (3\s) = [/3((s),...,/3p(s)]''', t = 0,1, if ||/3^(s) — 
/3°(s)||i,oo> £, we must have at least one k e {l,...,p}, such that ||/9fc(s) —/3°(s)||i> e, then 
due to Lemma 6, we can hnd Hq > dn—1 elements in {si,..., such that |/3^(s)—/9°(s)|> 
Without loss of generality, we denote these points as Si,..., Then for all Si,i = 1, ...,no, 
we have that 11/3^5*) - /3°(sj)lloo> |- 

Now dehne the set = {s G {si,...,s„} : ||/3^(sj) —/3°(Si)||oo> Then Pq = 

|‘5/3y/3o|> dn — 1. Dehne the test function 

= E «w>? 
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where $(«) = / ^||(X^X) ^Xy{s) —/3°(s)||2> Then by Lemma 5 (replacing e by 

e/2) we have 

Epo[^f3\i3o] < exp{-Cono}, < expl-Cono}, 

where Cq > 0 is a constant. Since uq > c/n — 1 for a positive constant c', we have that 
-^^'o[‘^/3h/3°] — exp{—Cn} and < exp{—Cn}. □ 

Lemma 8. There exists a test T for testing Hq : /3(s) = /3°(s) against Hi : /3(s) G = 
Uf n P = {(3{s) G Pn ■ ||/3(s) — /3°(s)||> e} in our proposed SVCM, such that 
Epo/^] < exp{-don}, Ep^[l - T] < exp{-din}, 
for some constant do, di with Pq and Pi corresponding to the probability distributions under 
Ho and Hi. 

Proof. Let Af = iV(|, P„, ll'Hoo) be the covering number of P„, by e:/2-balls under the supreme 
norm. Then for all/3(s) G we can hnd/3-^(s), j G {1,..., A/"} such that ||/3-^(s)—/3(s)||oo< 
|, which implies that ||/3^(s) - /3°(s)||oo> ||/3°(s) - /3(s)||oo-||/3^(s) - /3(s)||oo> | for all 
j = 1,..., A/”. Following the notations and results in Lemma 7 with regard to e/2, we have 
that the tests <F/ 3 y /30 all satisfy that i7^o[<F^y^o] < exp{—din} and [<F^jygo] < exp{—din} 
for some constant di. Now for the test function 

T = max ^o, 

which only depend on the set P„ instead of specihc (3{s) in the alternative hypothesis, 

Bftl'i'] < E Epii^pppo] < A/'expj—din} < expjCpn^pe: — din} < exp{—don}, 
i=i 

d 

for some constant do due to (28) and the fact that n^ = o(n). At the same time 

Epjl - T] < Ei 3 i [1 - <F/ 31 ,/ 3 o] < exp{-din}, 

which complete our proof. □ 

Now based on Lemma 1, equation (29) and Lemma 8, Theorem 2 follows from a direct 
application of Theorem A.l. of Choudhuri et ah (2004). 

C Details about the MCMC algorithm 
We list the details about our MCMC algorithm here. Denote by 0(-;^, S) the density 
function of N{y,,'E). We normally fix p = tc = 0.001 in the inverse-gamma priors and fix 
= 1 for the local GPs. 
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• Updating /3(sj), i = given the block structures of /3(sj), we update /3f = 

k = 1, (7 = 1, •••, G separately with p x G M-H steps. Specihcally, 

the full conditional [/3^ | f3,u,a‘^, A, 3^] is proportional to 


M/3f) = 


Kg) , 


n n 0 \yj_k{si)]Xjkh{si)Ix^[(3k{si)\,a‘^ 

\J.-. Si&Zg j = l 

where yj-k{si) = yj{si) - Y.tj^k^ 3 th{si)hA^k{si)\- We adopt a Metropolis-Hasting 

(M-H) algorithm to update /3f by first generating a proposal, /3^ -|- A/3^ with a zero 

mean Gaussian fluctuation A/3^. Then we set /3^ ^ /3^ -|- A/3^ with probability: 
r M/3^+A^U 

h(Pi) 


• Updating draw from its full conditional [cr^ | /3,A,3^] which is Inv-Ga(ao- 2 , 60 - 2 ) 
where a „2 = 0.001 -F ^ and 6^2 = 0.001 -h | (%(«) “ *J5 'a[/3(s*)]) . 


• Updating A; we sequentially update Ai,...,Ap with M-H algorithms. Specihcally, for 
Afc, the full conditional [A^ | A_fc,/3, 3 ^] is 


h{\k) 


nn 0 {yj-k{s),Xjkh{si)ht, [/3fc(si)], 

.1=1 j=i 


n(Afc), 


where n(Afc) is the uniform empirical Bayes prior for \k dehned in the previous section. 
The proposal for Afc is generate from zero mean Gaussian huctuations as \k + A\k, 
which will be accepted with probability: min |l, 


• Updating {uk}k=i- we sequentially update Ui,...,Up by drawing from their full con¬ 
ditionals [uk I /3,t|]. Specihcally, we update Uk by drawing from N S^n,) where 

with Z = diag(Ci,...,CL)- 


• Updating we sequentially update r^, ...,Tp by drawing from their full condi¬ 

tionals [r| I Uk]. Specihcally, we update by drawing from Inv-Ga(a.r 2 , where 
0.^2 = 0.001 -|- ^ and 6.^2 = 0.001 -|- \uTZ~^Uk. 

• Updating the spatial range parameter b within the SE kernel: this parameter can be 
updated by discretization. Specihcally, within a reasonable range of b, we can calculate 
and store the dictionaries of ifi and Q, the kernel expansion results, with regard to each 
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discrete values of 6 on a grid basis. Then we can update b based on grid search within 
each MCMC iteration. 
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